16 research outputs found
Information Theory-Guided Heuristic Progressive Multi-View Coding
Multi-view representation learning aims to capture comprehensive information
from multiple views of a shared context. Recent works intuitively apply
contrastive learning to different views in a pairwise manner, which is still
scalable: view-specific noise is not filtered in learning view-shared
representations; the fake negative pairs, where the negative terms are actually
within the same class as the positive, and the real negative pairs are
coequally treated; evenly measuring the similarities between terms might
interfere with optimization. Importantly, few works study the theoretical
framework of generalized self-supervised multi-view learning, especially for
more than two views. To this end, we rethink the existing multi-view learning
paradigm from the perspective of information theory and then propose a novel
information theoretical framework for generalized multi-view learning. Guided
by it, we build a multi-view coding method with a three-tier progressive
architecture, namely Information theory-guided hierarchical Progressive
Multi-view Coding (IPMC). In the distribution-tier, IPMC aligns the
distribution between views to reduce view-specific noise. In the set-tier, IPMC
constructs self-adjusted contrasting pools, which are adaptively modified by a
view filter. Lastly, in the instance-tier, we adopt a designed unified loss to
learn representations and reduce the gradient interference. Theoretically and
empirically, we demonstrate the superiority of IPMC over state-of-the-art
methods.Comment: This paper is accepted by the jourcal of Neural Networks (Elsevier)
by 2023. A revised manuscript of arXiv:2109.0234
M2HGCL: Multi-Scale Meta-Path Integrated Heterogeneous Graph Contrastive Learning
Inspired by the successful application of contrastive learning on graphs,
researchers attempt to impose graph contrastive learning approaches on
heterogeneous information networks. Orthogonal to homogeneous graphs, the types
of nodes and edges in heterogeneous graphs are diverse so that specialized
graph contrastive learning methods are required. Most existing methods for
heterogeneous graph contrastive learning are implemented by transforming
heterogeneous graphs into homogeneous graphs, which may lead to ramifications
that the valuable information carried by non-target nodes is undermined thereby
exacerbating the performance of contrastive learning models. Additionally,
current heterogeneous graph contrastive learning methods are mainly based on
initial meta-paths given by the dataset, yet according to our deep-going
exploration, we derive empirical conclusions: only initial meta-paths cannot
contain sufficiently discriminative information; and various types of
meta-paths can effectively promote the performance of heterogeneous graph
contrastive learning methods. To this end, we propose a new multi-scale
meta-path integrated heterogeneous graph contrastive learning (M2HGCL) model,
which discards the conventional heterogeneity-homogeneity transformation and
performs the graph contrastive learning in a joint manner. Specifically, we
expand the meta-paths and jointly aggregate the direct neighbor information,
the initial meta-path neighbor information and the expanded meta-path neighbor
information to sufficiently capture discriminative information. A specific
positive sampling strategy is further imposed to remedy the intrinsic
deficiency of contrastive learning, i.e., the hard negative sample sampling
issue. Through extensive experiments on three real-world datasets, we
demonstrate that M2HGCL outperforms the current state-of-the-art baseline
models.Comment: Accepted to the conference of ADMA2023 as an Oral presentatio
MetaMask: Revisiting Dimensional Confounder for Self-Supervised Learning
As a successful approach to self-supervised learning, contrastive learning
aims to learn invariant information shared among distortions of the input
sample. While contrastive learning has yielded continuous advancements in
sampling strategy and architecture design, it still remains two persistent
defects: the interference of task-irrelevant information and sample
inefficiency, which are related to the recurring existence of trivial constant
solutions. From the perspective of dimensional analysis, we find out that the
dimensional redundancy and dimensional confounder are the intrinsic issues
behind the phenomena, and provide experimental evidence to support our
viewpoint. We further propose a simple yet effective approach MetaMask, short
for the dimensional Mask learned by Meta-learning, to learn representations
against dimensional redundancy and confounder. MetaMask adopts the
redundancy-reduction technique to tackle the dimensional redundancy issue and
innovatively introduces a dimensional mask to reduce the gradient effects of
specific dimensions containing the confounder, which is trained by employing a
meta-learning paradigm with the objective of improving the performance of
masked representations on a typical self-supervised task. We provide solid
theoretical analyses to prove MetaMask can obtain tighter risk bounds for
downstream classification compared to typical contrastive methods. Empirically,
our method achieves state-of-the-art performance on various benchmarks.Comment: Accepted by NeurIPS 202
Modeling Multiple Views via Implicitly Preserving Global Consistency and Local Complementarity
While self-supervised learning techniques are often used to mining implicit
knowledge from unlabeled data via modeling multiple views, it is unclear how to
perform effective representation learning in a complex and inconsistent
context. To this end, we propose a methodology, specifically consistency and
complementarity network (CoCoNet), which avails of strict global inter-view
consistency and local cross-view complementarity preserving regularization to
comprehensively learn representations from multiple views. On the global stage,
we reckon that the crucial knowledge is implicitly shared among views, and
enhancing the encoder to capture such knowledge from data can improve the
discriminability of the learned representations. Hence, preserving the global
consistency of multiple views ensures the acquisition of common knowledge.
CoCoNet aligns the probabilistic distribution of views by utilizing an
efficient discrepancy metric measurement based on the generalized sliced
Wasserstein distance. Lastly on the local stage, we propose a heuristic
complementarity-factor, which joints cross-view discriminative knowledge, and
it guides the encoders to learn not only view-wise discriminability but also
cross-view complementary information. Theoretically, we provide the
information-theoretical-based analyses of our proposed CoCoNet. Empirically, to
investigate the improvement gains of our approach, we conduct adequate
experimental validations, which demonstrate that CoCoNet outperforms the
state-of-the-art self-supervised methods by a significant margin proves that
such implicit consistency and complementarity preserving regularization can
enhance the discriminability of latent representations.Comment: Accepted by IEEE Transactions on Knowledge and Data Engineering
(TKDE) 2022; Refer to https://ieeexplore.ieee.org/document/985763
Introducing Expertise Logic into Graph Representation Learning from A Causal Perspective
Benefiting from the injection of human prior knowledge, graphs, as derived
discrete data, are semantically dense so that models can efficiently learn the
semantic information from such data. Accordingly, graph neural networks (GNNs)
indeed achieve impressive success in various fields. Revisiting the GNN
learning paradigms, we discover that the relationship between human expertise
and the knowledge modeled by GNNs still confuses researchers. To this end, we
introduce motivating experiments and derive an empirical observation that the
human expertise is gradually learned by the GNNs in general domains. By further
observing the ramifications of introducing expertise logic into graph
representation learning, we conclude that leading the GNNs to learn human
expertise can improve the model performance. By exploring the intrinsic
mechanism behind such observations, we elaborate the Structural Causal Model
for the graph representation learning paradigm. Following the theoretical
guidance, we innovatively introduce the auxiliary causal logic learning
paradigm to improve the model to learn the expertise logic causally related to
the graph representation learning task. In practice, the counterfactual
technique is further performed to tackle the insufficient training issue during
optimization. Plentiful experiments on the crafted and real-world domains
support the consistent effectiveness of the proposed method
Supporting Vision-Language Model Inference with Causality-pruning Knowledge Prompt
Vision-language models are pre-trained by aligning image-text pairs in a
common space so that the models can deal with open-set visual concepts by
learning semantic information from textual labels. To boost the transferability
of these models on downstream tasks in a zero-shot manner, recent works explore
generating fixed or learnable prompts, i.e., classification weights are
synthesized from natural language describing task-relevant categories, to
reduce the gap between tasks in the training and test phases. However, how and
what prompts can improve inference performance remains unclear. In this paper,
we explicitly provide exploration and clarify the importance of including
semantic information in prompts, while existing prompt methods generate prompts
without exploring the semantic information of textual labels. A challenging
issue is that manually constructing prompts, with rich semantic information,
requires domain expertise and is extremely time-consuming. To this end, we
propose Causality-pruning Knowledge Prompt (CapKP) for adapting pre-trained
vision-language models to downstream image recognition. CapKP retrieves an
ontological knowledge graph by treating the textual label as a query to explore
task-relevant semantic information. To further refine the derived semantic
information, CapKP introduces causality-pruning by following the first
principle of Granger causality. Empirically, we conduct extensive evaluations
to demonstrate the effectiveness of CapKP, e.g., with 8 shots, CapKP
outperforms the manual-prompt method by 12.51% and the learnable-prompt method
by 1.39% on average, respectively. Experimental analyses prove the superiority
of CapKP in domain generalization compared to benchmark approaches
MetAug: Contrastive Learning via Meta Feature Augmentation
What matters for contrastive learning? We argue that contrastive learning
heavily relies on informative features, or "hard" (positive or negative)
features. Early works include more informative features by applying complex
data augmentations and large batch size or memory bank, and recent works design
elaborate sampling approaches to explore informative features. The key
challenge toward exploring such features is that the source multi-view data is
generated by applying random data augmentations, making it infeasible to always
add useful information in the augmented data. Consequently, the informativeness
of features learned from such augmented data is limited. In response, we
propose to directly augment the features in latent space, thereby learning
discriminative representations without a large amount of input data. We perform
a meta learning technique to build the augmentation generator that updates its
network parameters by considering the performance of the encoder. However,
insufficient input data may lead the encoder to learn collapsed features and
therefore malfunction the augmentation generator. A new margin-injected
regularization is further added in the objective function to avoid the encoder
learning a degenerate mapping. To contrast all features in one gradient
back-propagation step, we adopt the proposed optimization-driven unified
contrastive loss instead of the conventional contrastive loss. Empirically, our
method achieves state-of-the-art results on several benchmark datasets.Comment: Accepted by ICML 202
Using covariance weighted euclidean distance to assess the dissimilarity between integral experiments
Integral experiments especially criticality experiments help a lot in designing either new nuclear reactor or criticality assembly. The calculation uncertainty of the integral parameter which is introduced in by the nuclear data uncertainty is larger than the experimental uncertainty for most high-enriched uranium metal experiments, therefore the integral experiment is still very useful. There are lots of integral experiments have been done and documented. It should be considered carefully that which integral experiments should be used in applications. For instance, if the aim of the application is to validate the criticality design of a new reactor, integral experiments which are similar to the new reactor should be used. There are several similarity measures which have been used to assess the similarity between integral experiments, such as E similarity measure, G similarity measure and C similarity measure. But, there is no standard rule to choose which similarity measure should be used to assess the similarity between integral experiments in specific application. Another shortage of these similarity measures is that the thresholds of these similarity measures which should be set to judge whether the integral experiments are similar to each other or not have no clear physical meaning. In this paper, we will analyze the existing similarity measures which have been used to assess the similarity between integral experiments, and test some other similarity or dissimilarity measures which have been used in other research fields. After testing the Tanimato similarity measure and Euclidean distance, we find that the covariance weighted Euclidean distance is well suit to assess the dissimilarity between integral experiments, and the physical meaning of its threshold is clear. We recommend using covariance weighted Euclidean distance to assess the dissimilarity between integral experiments
Robust Causal Graph Representation Learning against Confounding Effects
The prevailing graph neural network models have achieved significant progress in graph representation learning. However, in this paper, we uncover an ever-overlooked phenomenon: the pre-trained graph representation learning model tested with full graphs underperforms the model tested with well-pruned graphs. This observation reveals that there exist confounders in graphs, which may interfere with the model learning semantic information, and current graph representation learning methods have not eliminated their influence. To tackle this issue, we propose Robust Causal Graph Representation Learning (RCGRL) to learn robust graph representations against confounding effects. RCGRL introduces an active approach to generate instrumental variables under unconditional moment restrictions, which empowers the graph representation learning model to eliminate confounders, thereby capturing discriminative information that is causally related to downstream predictions. We offer theorems and proofs to guarantee the theoretical effectiveness of the proposed approach. Empirically, we conduct extensive experiments on a synthetic dataset and multiple benchmark datasets. Experimental results demonstrate the effectiveness and generalization ability of RCGRL. Our codes are available at https://github.com/hang53/RCGRL
Bootstrapping Informative Graph Augmentation via A Meta Learning Approach
Recent works explore learning graph representations in a self-supervised
manner. In graph contrastive learning, benchmark methods apply various graph
augmentation approaches. However, most of the augmentation methods are
non-learnable, which causes the issue of generating unbeneficial augmented
graphs. Such augmentation may degenerate the representation ability of graph
contrastive learning methods. Therefore, we motivate our method to generate
augmented graph by a learnable graph augmenter, called MEta Graph Augmentation
(MEGA). We then clarify that a "good" graph augmentation must have uniformity
at the instance-level and informativeness at the feature-level. To this end, we
propose a novel approach to learning a graph augmenter that can generate an
augmentation with uniformity and informativeness. The objective of the graph
augmenter is to promote our feature extraction network to learn a more
discriminative feature representation, which motivates us to propose a
meta-learning paradigm. Empirically, the experiments across multiple benchmark
datasets demonstrate that MEGA outperforms the state-of-the-art methods in
graph self-supervised learning tasks. Further experimental studies prove the
effectiveness of different terms of MEGA.Comment: Accepted by International Joint Conference on Artificial Intelligence
(IJCAI) 202